Explore the recent global developments with R

Today, you will load a filtered gapminder dataset - with a subset of data on global development from 1952 - 2007 in increments of 5 years - to capture the period between the Second World War and the Global Financial Crisis.

Your task: Explore the data and visualise it in both static and animated ways, providing answers and solutions to 7 questions/tasks below.

Get the necessary packages

First, start with installing the relevant packages ‘tidyverse’, ‘gganimate’, and ‘gapminder’.

## ── Attaching packages ───────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## ── Conflicts ──────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Warning: package 'gganimate' was built under R version 3.6.2

Look at the data

First, see which specific years are actually represented in the dataset and what variables are being recorded for each country. Note that when you run the cell below, Rmarkdown will give you two results - one for each line - that you can flip between.

unique(gapminder$year)
##  [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
head(gapminder)
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.
gapminder <- as.data.frame(gapminder)

The dataset contains information on each country in the sampled year, its continent, life expectancy, population, and GDP per capita.

Let’s plot all the countries in 1952.

theme_set(theme_bw())  # set theme to white background for better visibility

ggplot(subset(gapminder, year == 1952), aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10() 

We see an interesting spread with an outlier to the right. Answer the following questions, please:

Q1. Why does it make sense to have a log10 scale on x axis?

#Answer Q1 As the differences in gdp are this big, you would not be able to see the richest countries within the plot, if you did not ‘squeeze’ the richest countries closer to the poorer one with this transformation of the x-axis.

Q2. What country is the richest in 1952 (far right on x axis)?

gapminder %>% filter(year==1952) %>% arrange(desc(gdpPercap))
##                      country continent year lifeExp       pop   gdpPercap
## 1                     Kuwait      Asia 1952  55.565    160000 108382.3529
## 2                Switzerland    Europe 1952  69.620   4815000  14734.2327
## 3              United States  Americas 1952  68.440 157553000  13990.4821
## 4                     Canada  Americas 1952  68.750  14785584  11367.1611
## 5                New Zealand   Oceania 1952  69.390   1994794  10556.5757
## 6                     Norway    Europe 1952  72.670   3327728  10095.4217
## 7                  Australia   Oceania 1952  69.120   8691212  10039.5956
## 8             United Kingdom    Europe 1952  69.180  50430000   9979.5085
## 9                    Bahrain      Asia 1952  50.939    120447   9867.0848
## 10                   Denmark    Europe 1952  70.780   4334000   9692.3852
## 11               Netherlands    Europe 1952  72.130  10381988   8941.5719
## 12                    Sweden    Europe 1952  71.860   7124673   8527.8447
## 13                   Belgium    Europe 1952  68.000   8730405   8343.1051
## 14                 Venezuela  Americas 1952  55.088   5439568   7689.7998
## 15                   Iceland    Europe 1952  72.490    147962   7267.6884
## 16                   Germany    Europe 1952  67.500  69145952   7144.1144
## 17                    France    Europe 1952  67.410  42459667   7029.8093
## 18            Czech Republic    Europe 1952  66.870   9125183   6876.1403
## 19              Saudi Arabia      Asia 1952  39.875   4005677   6459.5548
## 20                   Finland    Europe 1952  66.550   4090500   6424.5191
## 21                   Austria    Europe 1952  66.800   6927772   6137.0765
## 22                 Argentina  Americas 1952  62.485  17876956   5911.3151
## 23                   Uruguay  Americas 1952  66.071   2252965   5716.7667
## 24                      Cuba  Americas 1952  59.421   6007797   5586.5388
## 25                   Hungary    Europe 1952  64.030   9504000   5263.6738
## 26                   Ireland    Europe 1952  66.910   2952156   5210.2803
## 27           Slovak Republic    Europe 1952  64.360   3558137   5074.6591
## 28                     Italy    Europe 1952  65.940  47666000   4931.4042
## 29                   Lebanon      Asia 1952  55.928   1439529   4834.8041
## 30              South Africa    Africa 1952  45.009  14264935   4725.2955
## 31                     Gabon    Africa 1952  37.003    420702   4293.4765
## 32                  Slovenia    Europe 1952  65.570   1489518   4215.0417
## 33                      Iraq      Asia 1952  45.320   5441766   4129.7661
## 34                    Israel      Asia 1952  65.390   1620914   4086.5221
## 35                    Poland    Europe 1952  61.310  25730551   4029.3297
## 36                     Chile  Americas 1952  54.745   6377619   3939.9788
## 37                     Spain    Europe 1952  64.940  28549870   3834.0347
## 38                      Peru  Americas 1952  43.902   8025700   3758.5234
## 39                    Serbia    Europe 1952  57.996   6860147   3581.4594
## 40                    Greece    Europe 1952  65.860   7733250   3530.6901
## 41                   Ecuador  Americas 1952  48.357   3548753   3522.1107
## 42                    Angola    Africa 1952  30.015   4232095   3520.6103
## 43                    Mexico  Americas 1952  50.789  30144317   3478.1255
## 44                     Japan      Asia 1952  63.030  86459025   3216.9563
## 45                   Romania    Europe 1952  61.050  16630000   3144.6132
## 46                   Croatia    Europe 1952  61.210   3882229   3119.2365
## 47                 Nicaragua  Americas 1952  42.314   1165790   3112.3639
## 48               Puerto Rico  Americas 1952  64.280   2227000   3081.9598
## 49                  Portugal    Europe 1952  59.820   8526050   3068.3199
## 50          Hong Kong, China      Asia 1952  60.960   2125900   3054.4212
## 51               El Salvador  Americas 1952  45.262   2042865   3048.3029
## 52                      Iran      Asia 1952  44.869  17272000   3035.3260
## 53       Trinidad and Tobago  Americas 1952  59.100    662850   3023.2719
## 54                   Jamaica  Americas 1952  58.530   1426095   2898.5309
## 55                   Reunion    Africa 1952  52.724    257700   2718.8853
## 56                   Bolivia  Americas 1952  40.414   2883315   2677.3263
## 57                  Djibouti    Africa 1952  34.812     63149   2669.5295
## 58                Montenegro    Europe 1952  59.164    413834   2647.5856
## 59                Costa Rica  Americas 1952  57.206    926317   2627.0095
## 60                    Panama  Americas 1952  55.191    940080   2480.3803
## 61                   Algeria    Africa 1952  43.077   9279525   2449.0082
## 62                  Bulgaria    Europe 1952  59.600   7274900   2444.2866
## 63                 Guatemala  Americas 1952  42.023   3146381   2428.2378
## 64                   Namibia    Africa 1952  41.725    485831   2423.7804
## 65                     Libya    Africa 1952  42.723   1019729   2387.5481
## 66                 Singapore      Asia 1952  60.396   1127000   2315.1382
## 67                  Honduras  Americas 1952  41.912   1517453   2194.9262
## 68                  Colombia  Americas 1952  50.643  12350771   2144.1151
## 69               Congo, Rep.    Africa 1952  42.111    854885   2125.6214
## 70                    Brazil  Americas 1952  50.917  56602560   2108.9444
## 71                    Turkey    Europe 1952  43.585  22235677   1969.1010
## 72                 Mauritius    Africa 1952  50.986    516556   1967.9557
## 73                  Paraguay  Americas 1952  62.649   1555876   1952.3087
## 74                     Haiti  Americas 1952  37.579   3201488   1840.3669
## 75                  Malaysia      Asia 1952  48.463   6748378   1831.1329
## 76                      Oman      Asia 1952  37.578    507833   1828.2303
## 77                   Morocco    Africa 1952  42.873   9939217   1688.2036
## 78                     Syria      Asia 1952  45.883   3661549   1643.4854
## 79                     Sudan    Africa 1952  38.635   8504667   1615.9911
## 80                   Albania    Europe 1952  55.230   1282697   1601.0561
## 81                    Jordan      Asia 1952  43.158    607914   1546.9078
## 82        West Bank and Gaza      Asia 1952  43.160   1030585   1515.5923
## 83                   Tunisia    Africa 1952  44.600   3647735   1468.4756
## 84                   Senegal    Africa 1952  37.278   2755589   1450.3570
## 85                Madagascar    Africa 1952  36.681   4762912   1443.0117
## 86                     Egypt    Africa 1952  41.893  22223309   1418.8224
## 87        Dominican Republic  Americas 1952  45.928   2491346   1397.7171
## 88             Cote d'Ivoire    Africa 1952  40.477   2977019   1388.5947
## 89               Philippines      Asia 1952  47.752  22438691   1272.8810
## 90                    Taiwan      Asia 1952  58.500   8550362   1206.9479
## 91                      Chad    Africa 1952  38.092   2682462   1178.6659
## 92                  Cameroon    Africa 1952  38.523   5009067   1172.6677
## 93                 Swaziland    Africa 1952  41.407    290243   1148.3766
## 94                    Zambia    Africa 1952  42.038   2672000   1147.3888
## 95                   Somalia    Africa 1952  32.978   2526994   1135.7498
## 96                   Comoros    Africa 1952  40.715    153936   1102.9909
## 97          Korea, Dem. Rep.      Asia 1952  50.056   8865488   1088.2778
## 98                 Sri Lanka      Asia 1952  57.593   7982342   1083.5320
## 99                   Nigeria    Africa 1952  36.324  33119096   1077.2819
## 100 Central African Republic    Africa 1952  35.463   1291695   1071.3107
## 101                    Benin    Africa 1952  38.223   1738315   1062.7522
## 102              Korea, Rep.      Asia 1952  47.453  20947571   1030.5922
## 103   Bosnia and Herzegovina    Europe 1952  53.820   2791000    973.5332
## 104                    Ghana    Africa 1952  43.149   5581001    911.2989
## 105             Sierra Leone    Africa 1952  30.331   2143249    879.7877
## 106    Sao Tome and Principe    Africa 1952  46.471     60011    879.5836
## 107                     Togo    Africa 1952  38.596   1219113    859.8087
## 108                    Kenya    Africa 1952  42.270   6464046    853.5409
## 109                 Botswana    Africa 1952  47.622    442308    851.2411
## 110                 Mongolia      Asia 1952  42.244    800663    786.5669
## 111              Yemen, Rep.      Asia 1952  32.548   4963829    781.7176
## 112         Congo, Dem. Rep.    Africa 1952  39.143  14100005    780.5423
## 113              Afghanistan      Asia 1952  28.801   8425333    779.4453
## 114                    Niger    Africa 1952  37.444   3379468    761.8794
## 115                 Thailand      Asia 1952  50.848  21289402    757.7974
## 116                Indonesia      Asia 1952  37.468  82052000    749.6817
## 117               Mauritania    Africa 1952  40.543   1022556    743.1159
## 118                   Uganda    Africa 1952  39.978   5824797    734.7535
## 119                 Tanzania    Africa 1952  41.215   8322925    716.6501
## 120                 Pakistan      Asia 1952  43.436  41346560    684.5971
## 121               Bangladesh      Asia 1952  37.484  46886859    684.2442
## 122                  Vietnam      Asia 1952  40.412  26246839    605.0665
## 123                  Liberia    Africa 1952  38.480    863308    575.5730
## 124                    India      Asia 1952  37.373 372000000    546.5657
## 125                    Nepal      Asia 1952  36.157   9182536    545.8657
## 126             Burkina Faso    Africa 1952  31.975   4469979    543.2552
## 127                   Guinea    Africa 1952  33.609   2664249    510.1965
## 128                   Rwanda    Africa 1952  40.000   2534927    493.3239
## 129                   Gambia    Africa 1952  30.000    284320    485.2307
## 130               Mozambique    Africa 1952  31.286   6446316    468.5260
## 131                     Mali    Africa 1952  33.685   3838168    452.3370
## 132                 Zimbabwe    Africa 1952  48.451   3080907    406.8841
## 133                    China      Asia 1952  44.000 556263527    400.4486
## 134        Equatorial Guinea    Africa 1952  34.482    216964    375.6431
## 135                   Malawi    Africa 1952  36.256   2917802    369.1651
## 136                 Cambodia      Asia 1952  39.417   4693836    368.4693
## 137                 Ethiopia    Africa 1952  34.078  20860941    362.1463
## 138                  Burundi    Africa 1952  39.031   2445618    339.2965
## 139                  Myanmar      Asia 1952  36.319  20092996    331.0000
## 140                  Eritrea    Africa 1952  35.928   1438760    328.9406
## 141            Guinea-Bissau    Africa 1952  32.500    580653    299.8503
## 142                  Lesotho    Africa 1952  42.138    748747    298.8462

#Answer Q2 From the above code you can see, that the maximum gdp in 1952 was 108382.35 in the country Kuwait.

You can generate a similar plot for 2007 and compare the differences

ggplot(subset(gapminder, year == 2007), aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10() 

The black bubbles are a bit hard to read, the comparison would be easier with a bit more visual differentiation.

Q3. Can you differentiate the continents by color and fix the axis labels?

#Answer Q3 You can differentiate between the continents by adding a color argument in the aestetics ie. color=continent. You add labels to the x- and y-axis with the labs function, in which you specify, what you want to call the x- and y-axis respectively. You can also specify the title of the legends in the labs() function. The size= argument names the size lab and the color= argument defines what the color lab is called. If you eg. had ‘shape’ as a facet in your plot, you could say shape=“title” to name that argument.

ggplot(subset(gapminder, year == 2007), aes(gdpPercap, lifeExp, size = pop, color=continent)) + #color by continent
  geom_point() +
  scale_x_log10()+labs(x="GDP per capita", y="Life expentancy", colour="Continent", size="Population size") #adding axis labels

Q4. What are the five richest countries in the world in 2007?

gapminder %>% filter(year==2007) %>% arrange(desc(gdpPercap)) %>% head(n=5)
##         country continent year lifeExp       pop gdpPercap
## 1        Norway    Europe 2007  80.196   4627926  49357.19
## 2        Kuwait      Asia 2007  77.588   2505559  47306.99
## 3     Singapore      Asia 2007  79.972   4553009  47143.18
## 4 United States  Americas 2007  78.242 301139947  42951.65
## 5       Ireland    Europe 2007  78.885   4109086  40676.00

#Answer Q4 The five richest countries in 2007 were Norway, Kuwait, Singapore, United States and Ireland.

Make it move!

The comparison would be easier if we had the two graphs together, animated. We have a lovely tool in R to do this: the gganimate package. And there are two ways of animating the gapminder ggplot.

Option 1: Animate using transition_states()

The first step is to create the object-to-be-animated

anim <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10()  # convert x to log scale
anim

This plot collates all the points across time. The next step is to split it into years and animate it. This may take some time, depending on the processing power of your computer (and other things you are asking it to do). Beware that the animation might appear in the ‘Viewer’ pane, not in this rmd preview. You need to knit the document to get the viz inside an html file.

anim + transition_states(year, 
                      transition_length = 1,
                      state_length = 1)

Notice how the animation moves jerkily, ‘jumping’ from one year to the next 12 times in total. This is a bit clunky, which is why it’s good we have another option.

Option 2 Animate using transition_time()

This option smoothes the transition between different ‘frames’, because it interpolates and adds transitional years where there are gaps in the timeseries data.

anim2 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop)) +
  geom_point() +
  scale_x_log10() + # convert x to log scale
  transition_time(year)
anim2

The much smoother movement in Option 2 will be much more noticeable if you add a title to the chart, that will page through the years corresponding to each frame.

Q5 Can you add a title to one or both of the animations above that will change in sync with the animation? [hint: search labeling for transition_states() and transition_time() functions respectively]

anim3 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color=continent)) +
  geom_point() +
  scale_x_log10() + # convert x to log scale
transition_states(year, transition_length = 1, state_length = 1)+labs(title = 'Year: {closest_state}')

anim3

#Answer Q5 In the above you define what one ‘transition’ is (here it is 1 step in the year variable) and you use these defined steps (in states) to make a title. I.e. you tell it that the title should be the closest ‘state’ at that point in the animation.

Q6 Can you made the axes’ labels and units more readable? Consider expanding the abreviated lables as well as the scientific notation in the legend and x axis to whole numbers.[hint:search disabling scientific notation]

options(scipen=999) #setting a value for how likely it is that scientific notation is triggered

anim4 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color=continent)) +
  geom_point() +
  scale_x_log10() + # convert x to log scale
transition_states(year, transition_length = 1, state_length = 1)+labs(title = 'Year: {closest_state}', x="GDP per capita ($) on a log scale", y="Life expentancy (years)", colour="Continent", size="Population size")
anim4

#Answer Q6 Yes, with the options function. With the options function you set a value for how likely it is that scientific notation (with the e’s) is triggered. If you set it to a large positive value then there is a very low chance that scientific notation is triggered.

Q7 Come up with a question you want to answer using the gapminder data and write it down. Then, create a data visualisation that answers the question and explain how your visualization answers the question. (Example: you wish to see what was mean life expectancy across the continents in the year you were born versus your parents’ birth years). [hint: if you wish to have more data than is in the filtered gapminder, you can load either the gapminder_unfiltered dataset and download more at https://www.gapminder.org/data/ ]

#Answer Q7 Question: How was the development in GDP per capita in the two nordic countries Nowrway and Denmark in the period 1952-2007?

Here I use the cowplot package to plot the two plots next to each other and the viridis package to get some pretty colors on it.

#install.packages("cowplot") # package for plotting to plots in one
#install.packages("viridis")  # package for coloring
library(viridis) 
## Loading required package: viridisLite
library(cowplot)
## Warning: package 'cowplot' was built under R version 3.6.2
summary(gapminder)#to get summary stats for each collumn in the gapminder data
##         country        continent        year         lifeExp     
##  Afghanistan:  12   Africa  :624   Min.   :1952   Min.   :23.60  
##  Albania    :  12   Americas:300   1st Qu.:1966   1st Qu.:48.20  
##  Algeria    :  12   Asia    :396   Median :1980   Median :60.71  
##  Angola     :  12   Europe  :360   Mean   :1980   Mean   :59.47  
##  Argentina  :  12   Oceania : 24   3rd Qu.:1993   3rd Qu.:70.85  
##  Australia  :  12                  Max.   :2007   Max.   :82.60  
##  (Other)    :1632                                                
##       pop               gdpPercap       
##  Min.   :     60011   Min.   :   241.2  
##  1st Qu.:   2793664   1st Qu.:  1202.1  
##  Median :   7023596   Median :  3531.8  
##  Mean   :  29601212   Mean   :  7215.3  
##  3rd Qu.:  19585222   3rd Qu.:  9325.5  
##  Max.   :1318683096   Max.   :113523.1  
## 
#using the theme() package to make pretty plots

plot_1 <- gapminder %>%  filter(country=="Denmark") %>% ggplot(., aes(x=year, y=gdpPercap)) + geom_point(aes(color=gdpPercap), show.legend = F)+labs(x="Year", y="GDP per capita", title="Economic development in Denmark", subtitle="1952-2007") + theme(plot.title = element_text(size=10), plot.subtitle=element_text(size=8), axis.title = element_text(size=8)) + scale_color_viridis(option = "D")+theme_minimal()



plot_2 <- gapminder %>%  filter(country=="Norway") %>% ggplot(., aes(x=year, y=gdpPercap)) + geom_point(aes(color=gdpPercap), show.legend = F)+labs(x="Year", y="GDP per capita", title="Economic development in Norway", subtitle="1952-2007") + theme(plot.title = element_text(size=10), plot.subtitle=element_text(size=8), axis.title = element_text(size=8)) + scale_color_viridis(option = "D")+theme_minimal()

plot_grid(plot_1, plot_2, labels="AUTO")

getwd()
## [1] "/Users/astridrybner/Documents/Cultural data science/au611689_rybner_astrid"